8th November, 2019
\[ \begin{aligned} X &:= \epsilon_X \\[.5em] Y &:= X + \epsilon_Y \\[.5em] Z &:= Y + \epsilon_Z \enspace , \end{aligned} \]
4) Exercises
An introduction to Causal inference is based on this talk
\[ \begin{aligned} X &:= \epsilon_X \\[.5em] Y &:= X + \epsilon_Y \\[.5em] Z &:= Y + \epsilon_Z \enspace , \end{aligned} \]
\[ p(X_1, \ldots, X_n) = \prod_{i=1}^n p(X_i \mid \text{Parents}(X_i)) \hspace{6em} p(X, Y, Z) = p(Z \mid Y) \, p(Y \mid X) \, p(X) \]
set.seed(1) n <- 100 x <- rnorm(n, 0, 1) y <- x + rnorm(n, 0, 1) z <- y + rnorm(n, 0, 0.1)
\[ ACE(Z \rightarrow Y) = \mathbb{E}\left[Y \mid do(Z = z + 1) \right] - \mathbb{E}\left[Y \mid do(Z = z) \right] \enspace . \]
set.seed(1)
get_ce <- function(zvalue) {
n <- 100
x <- rnorm(n, 0, 1)
y <- x + rnorm(n, 0, 1)
z <- zvalue
mean(y) # E[Y | do(Z = zvalue)]
}
(ACE <- get_ce(1) - get_ce(0))
## [1] -0.0101961
set.seed(1)
get_ce <- function(xvalue) {
n <- 100
x <- xvalue
y <- x + rnorm(n, 0, 1)
z <- rnorm(n, 0, 0.1)
mean(y) # E[Y | do(X = xvalue)]
}
(ACE <- get_ce(1) - get_ce(0))
## [1] 1.079214
Backdoor Criterion (Pearl, Glymour, & Jewell, 2016, p. 61): An adjustment set \(\mathcal{Z}\) fulfills the backdoor criterion if no member in \(\mathcal{Z}\) is a descendant of \(X\) and members in \(\mathcal{Z}\) block all paths between \(X\) and \(Y\). Adjusting for \(\mathcal{Z}\) thus yields the causal effect of \(X \rightarrow Y\).
set.seed(1) n <- 1000 x <- rnorm(n, 0, 1) y <- rnorm(n, 0, 1) z <- x + y + rnorm(n, 0, 1)
coef(lm(y ~ 0 + x))
## x ## 0.006608738
coef(lm(y ~ 0 + x + z))
## x z ## -0.5262511 0.5047807
\[ \begin{aligned} S &:= \epsilon_S \\[.5em] T &:= S + \epsilon_T \\[.5em] U &:= W + \epsilon_U \\[.5em] V &:= 0.90 Z + \epsilon_V \\[.5em] W &:= 0.50 S + 1 X + 3 Y + \epsilon_W \\[.5em] X &:= Z + \epsilon_X \\[.5em] Y &:= 1.50V + X + \epsilon_Y \\[.5em] Z &:= \epsilon_Z \end{aligned} \]
\[ (\epsilon_S, \epsilon_T, \epsilon_U, \epsilon_V, \epsilon_W, \epsilon_X, \epsilon_Y, \epsilon_Z) \stackrel{iid}{\sim} \mathcal{N}(0, 1) \]
Download the (made-up) observational data set about mental health, exercise, and age from here.
Draw a DAG that could underlie these data. Which analysis is the correct one?